I have been using my Geforce 1060 extensively for deep learning, both with
Python and
R. But the always painful play with the closed source drivers and kernel updates, paired with the collapse of my computer s PSU and/or GPU, I decided to finally do the switch to AMD graphic card and open source stack. And you know what, within half a day I had everything, including Tensorflow running. Yeah to Open Source!
Preliminaries
So what is the starting point: I am running Debian/unstable with a AMD Radeon 5700. First of all I purged all NVIDIA related packages, and that are a lot I have to say. Be sure to search for nv and nvidia and get rid of all packages. For safety I did reboot and checked again that no kernel modules related to NVIDIA are loaded.
Firmware
Debian ships the package
amd-gpu-firmware
but this is not enough for the current kernel and current hardware. Better is to clone
git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
and copy everything from the
amdgpu
directory to
/lib/firmware/amdgpu
.
I didn t do that at first, and then booting the kernel did hang during the switch to AMD framebuffer. If you see this behaviour, your firmwares are too old.
Kernel
The advantage of having open source driver that is in the kernel is that you don t have to worry about incompatibilities (like every time a new kernel comes out the NVIDIA driver needs patching). For recent AMD GPUs you need a rather new kernel, I have 5.6.0 and 5.7.0-rc5 running. Make sure that you have all the necessary kernel config options turned on if you compile your own kernels. In my case this is
CONFIG_DRM_AMDGPU=m
CONFIG_DRM_AMDGPU_USERPTR=y
CONFIG_DRM_AMD_ACP=y
CONFIG_DRM_AMD_DC=y
CONFIG_DRM_AMD_DC_DCN=y
CONFIG_HSA_AMD=y
When installing the kernel, be sure that the firmware is already updated so that the correct firmware is copied into the initrd.
Support programs and libraries
All the following is more or less an excerpt from the
ROCm Installation Guide!
AMD provides a Debian/Ubuntu APT repository for software as well as kernel sources. Put the following into
/etc/apt/sources.list.d/rocm.list
:
deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main
and also put the
public key of the rocm repository into
/etc/apt/trusted.d/rocm.asc
.
After that
apt-get update
should work.
I did install
rocm-dev-3.3.0
,
rocm-libs-3.3.0
,
hipcub-3.3.0
,
miopen-hip-3.3.0
(and of course the dependencies), but not
rocm-dkms
which is the kernel module. If you have a sufficiently recent kernel (see above), the source in the kernel itself is newer.
The libraries and programs are installed under
/opt/rocm-3.3.0
, and to make the libraries available to Tensorflow (see below) and other programs, I added
/etc/ld.so.conf.d/rocm.conf
with the following content:
and run
ldconfig
as root.
Last but not least, add a udev rule that is normally installed by
rocm-dkms
, put the following into
/etc/udev/rules.d/70-kfd.rules
:
SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"
This allows users from the
video
group to access the GPU.
Up to here you should be able to boot into the system and have X running on top of AMD GPU, including OpenGL acceleration and direct rendering:
$ glxinfo
ame of display: :0
display: :0 screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
...
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
...
Tensorflow
Thinking about how hard it was to get the correct libraries to get Tensorflow running on GPUs (see
here and
here), it is a pleasure to see that with open source all this pain is relieved.
There is already work done to make
Tensorflow run on ROCm, the
tensorflow-rocm project. The provide up to date PyPi packages, so a simple
pip3 install tensorflow-rocm
is enough to get Tensorflow running with Python:
>> import tensorflow as tf
>> tf.add(1, 2).numpy()
2020-05-14 12:07:19.590169: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
...
2020-05-14 12:07:19.711478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7444 MB memory) -> physical GPU (device: 0, name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT], pci bus id: 0000:03:00.0)
3
>>
Tensorflow for R
Installation is trivial again since there is a
tensorflow for R package, just run (as a user that is in the group
staff
, which normally own
/usr/local/lib/R
)
$ R
...
> install.packages("tensorflow")
..
Do not call the R function
install_tensorflow()
since Tensorflow is already installed and functional!
With that done, R can use the AMD GPU for computations:
$ R
...
> library(tensorflow)
> tf$constant("Hellow Tensorflow")
2020-05-14 12:14:24.185609: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
...
2020-05-14 12:14:24.277736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7444 MB memory) -> physical GPU (device: 0, name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT], pci bus id: 0000:03:00.0)
tf.Tensor(b'Hellow Tensorflow', shape=(), dtype=string)
>
AMD Vulkan
From the
Vulkan home page:
Vulkan is a new generation graphics and compute API that provides high-efficiency, cross-platform access to modern GPUs used in a wide variety of devices from PCs and consoles to mobile phones and embedded platforms.
Several games are using the Vulkan API if available and it is said to be more efficient.
There are Vulkan libraries for Radeon shipped in with mesa, in the Debian package
mesa-vulkan-drivers
, but they look a bit outdated is my guess.
The
AMDVLK project provides the latest version, and to my surprise was rather easy to install, again by following the advice in their
README. The steps are basically (always follow what is written for Ubuntu):
- Install the necessary dependencies
- Install the Repo tool
- Get the source code
- Make 64-bit and 32-bit builds
- Copy driver and JSON files (see below for what I did differently!)
All as described in the linked README. Just to make sure, I removed the JSON files
/usr/share/vulkan/icd.d/radeon*
shipped by Debians
mesa-vulkan-drivers
package.
Finally I deviated a bit by not editing the file
/usr/share/X11/xorg.conf.d/10-amdgpu.conf
, but instead copying to
/etc/X11/xorg.conf.d/10-amdgpu.conf
and adding there the section:
Section "Device"
Identifier "AMDgpu"
Option "DRI" "3"
EndSection
.
To be honest, I did not follow the
Copy driver and JSON files literally, since I don t want to copy self-made files into system directories under
/usr/lib
. So what I did is:
- copy the driver files to /opt/amdvkn/lib, so I have now there
/opt/amdvlk/lib/i386-linux-gnu/amdvlk32.so
and /opt/amdvlk/lib/x86_64-linux-gnu/amdvlk64.so
- Adjust the location of the driver file in the two JSON files
/etc/vulkan/icd.d/amd_icd32.json
and /etc/vulkan/icd.d/amd_icd64.json
(which were installed above under Copy driver and JSON files)
- added a file
/etc/ld.so.conf.d/amdvlk.conf
containing the two lines:
/opt/amdvlk/lib/i386-linux-gnu
/opt/amdvlk/lib/x86_64-linux-gnu
With this in place, I don t pollute the system directories, and still the new Vulkan driver is available.
But honestly, I don t really know whether it is used and is working, because I don t know how to check.
With all that in place, I can run my usual set of Steam games (The Long Dark, Shadow of the Tomb Raider, The Talos Principle, Supraland, ) and I don t see any visual problem till now. As a bonus, KDE/Plasma is now running much better, since NVIDIA and KDE has traditionally some incompatibilities.
The above might sound like a lot of stuff to do, but considering that most of the parts are not really packaged within Debian, and all this is rather new open source stack, I was surprised that in half a day I got all working smoothly.
Thanks to all the developers who have worked hard to make this all possible.